All the experiments were conducted on a Hadoop computing cluster of 100 CPU cores (i.e., Dell PowerEdge R530) and 10Gbs network bandwidth between nodes, computing time could be different in different computing environments. Strategies for the coincidental data discovery using PySpark and Sedona were designed based on Hadoop 3.0, Apache Spark 2.4, and Sedona 1.0. File reading and writing in the strategies were designed based on the Hadoop File System (HDFS), therefore metadata need to be put to the  HDFS if users want to use the scripts directly.
Baselines were designed based on Python 3.7 and GeoPandas 0.14.
A step-by-step instruction is included under the same directory, named "Step by step instruction on reproducing results in our manuscript.docx".